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It  Is  not  growing  like  a  tree  ... 

...  In  small  proportions  we  just  beauties  see;  -  Ben  Jonson. 

INTRODUCTION 

The  design  of  application  of  artificial  Intelligence  to  a  sclentlfl 
task  such  as  Organic  Chemical  Svnthesls  was  the  topic  of  a  Doctoral 
Thesis  completed  In  the  summer  of  1R71  (Reference  1).  Chemical 
synthesis  In  practice  Involves  1)  the  choice  of  molecule  to  be 
synthesized;  1 1 >  the  formulation  and  specification  of  a  plan  for 
synthesis  (Involving  a  valid  reaction  pathway  ’eadlng  from  commercial  or 
readily  available  compounds  to  the  target  compounds  with  consideration 
of  feasibility  regarding  the  purposes  of  synthesis);  III)  the  selection 
of  specific  Individual  steps  of  reaction  and  their  temporal  ordering  for 
execution;  lv)  the  experimental  execution  of  the  synthesis  and  v)  the 
redesign  of  syntheses.  If  necessary,  depending  upon  the  experimental 
results.  In  contrast  to  the  physical  synthesis  of  the  molecule,  the 
activity  In  II)  above  can  be  termed  the  'formal  synthesis'.  This 
development  of  the  specification  of  syntheses  Involves  no  laboratory 
technique  and  Is  carried  out  mainly  on  paper  and  In  the  minds  of 
chemists  (and  now  within  a  computer's  memory!), 

IMPORTANCE  AND  DIFFICULTY  OF  CHEMICAL  SYNTHESIS 

The  importance  of  chemical  synthesis  Is  undeniable  and  there  Is 
emphatic  testimony  to  the  high  regard  held  by  scientists  for  synthesis 
chemists.  The  level  of  Intellectual  activity  and  difficulty  Involved 


In  chem'cal  synthesis  are  Illustrated  by  Vitamin  A  (example  solved 
by  our  program)  and  Vitamin  B12.  Both  problems  absorbed  the  efforts 
of  several  teams  of  expert  chemists  and  held  them  at  bay  for  over 
20  years.  Professor  R.B.  Woodward  of  Harvard  University  was  awarded 

the  nohei  prize  In  1Q65  for  h»s  numerous  and  brilliant  syntheses  and 
their  contribution  to  science. 

A  DESIGN  DECISION 

A  program  has  been  written  to  execute  a  search  for  chemical 
syntheses  CI.e.  formal  syntheses)  for  relatively  complex  organic 
molecules.  Emphasis  has  been  placed  on  achieving  a  fast  and  efficient 
practical  system  that  solves  Interesting  problems  In  organic  chemistry. 

The  choice  of  design  made  very  early  In  this  project  Is  worth 
mentioning.  W c  could  have  aimed  at  an  Interactive  system  which 
would  employ  a  chemist  seated  at  a  console  guiding  the  search  for 
synthesis.  The  merit  of  this  approach,  exemplified  by  Corey 
(Reference  4),  lies  In  this  direct  Interaction  between  the  chemist 
and  computer  whereby  the  designers  are  afforded  rapid  feedback 
allowing  the  system  to  evolve  Into  a  tool  for  the  chemists.  An 
obvious  shortcoming  however.  Is  that  It  circumvents  the  questions 
that  are  very  pertinent  to  artificial  Intelligence.  In  contrast, 
our  approach  was  to  design  a  non-lnteractlve,  batch-mode  program  with 
artificial  Intelligence  aspects  built  Into  It.  We  have  tackled  the 
problem  of  synthesis  discovery  chiefly  from  the  vantage  point  of 
artificial  Intelligence,  utilizing  ;he  task  area  only  as  a  vehicle 
to  Investigate  the  NATURE  OF  AN  APPLICATION  OF  MACHINE  REASONING 
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WITH  AN  EXTENSIVE  SCIENT'FIC  KNOWLEDGE  BASE. 

Our  choice  Is  perhaps  vindicated  on  three  counts: 

a)  It  has  freed  us  from  the  distractions  of  designing  a  user 
Interface,  which  Is  not  a  simple  task; 

b)  it  has  resulted  In  a  fast  system  that  runs  on  standard  hardware 
to  be  found  In  nearly  every  medium-sized  computation  center,  and  has 
prodiced  successfully  several  syntheses  for  each  of  several  complex 
mol ecul es ; 

c)  the  r.rogram  works  autonomously  In  searching  for  solutions  and 
Incorporates  Into  Its  task  several  key  judgemental  capabilities  of 

a  competent  synthesis  chemist. 


TASK  FNVIRONMFNT 

The  program  accepts  as  Input  some  representat Ion  of  the  target 
compound  together  with  a  list  of  conditions  and  constraints  that  must 
govern  the  proposed  syntheses  (Figure  1).  A  list  of  compounds  that  are 
commercially  available  (along  with  Indications  of  cost  and  availability) 
can  be  consulted.  A  reaction  library  containing  generalized  procedures 
Is  supplied  to  the  Program.  The  output  Is  a  set  of  proposed  syntheses, 
each  being  a  valid  reaction  pathway  from  available  compounds  to  the 
target  molecule.  The  syntheses  are  arrived  at  uy  means  of  strategic 

exploration  of  an  AND-OR  search  space.  The  design  of  the  search  strategy 
concerns  us  here. 

The  search  space  has  character l st l cs  that  make  the  problem  a  novel 
one.  Well  known  search  strategies  using  AMn-0«>  problem  so. vtng 
trees  (Reference  2)  concern  themselves  with  either  optimal  solutions 
or  minimal  effort  spent  In  fining  a  solution.  Heuristic  DENDPAL 
In  Its  search  for  a  solution  has  the  distinction  of  knowing  that 
only  one  answer  Is  'the  correct  answer^  and  fewer  number  of 
alternative  solutions  Is  commensurate  with  greater  success  for  the 
Program.  The  synthesis  program,  on  the  other  hand.  Is  not  aimed 
toward  any  optimal  search  or  toward  ‘the  best’  synthesis  (there  Is 
not  one).  Quite  simply,  the  task  of  the  synthesis  search  Is  to 
explore  alternative  routes  of  synthesis  and  develop  a  problem 
solving  tree  rich  In  Information,  having  several  ‘good'  complete 
syntheses.  The  success  of  the  program  Is  not  to  be  judged  solely 
on  the  nur.ner  or  variety  of  completed  syntheses,  hut  with 
the  understanding  that  paths  of  exploration  not  completed  by  the 
program  are  very  Informative  as  well. 
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The  reader  Is  referred  to  the  Thesis  (Reference  1)  for  a  detailed 
exposition  of  the  algorithm,  programming  details  such  as  chemical 
structure  representat I  on,  representat ion  of  reactions,  the  setup 
of  a  reaction  library  and  a  catalog  of  readily  available  compounds. 
This  brief  article  describes  one  aspect  of  the  problem  tha,  is  of 
Primary  significance  to  those  Interested  In  artificial  intelligence. 
Other  topics  of  interest  to  be  found  In  the  Thesis  Include: 

Elimination  of  Invalid  subgoals.  Invalidation  of  subgoals  by  cost 

considerations.  Elimination  of  recundant  subgoals  and  Elimination  of 
unpromising  subgoals. 

RAS1C  CONCEPTS  AND  TERMS 

A  sample  synthesis  problem,  deliberately  chosen  for  its 
simplicity.  Is  now  followed  partially  through  the  search  for  a 
solution.  The  Intent  of  this  example  Is  mainly  to  Introduce  some 
basic  concepts  and  to  Illustrate  terminology.  |t  Is  not  Intended 
to  explicate  the  complexity  of  the  task  area.  In  dealing  with 
the  example  the  hypothetical  course  of  problem  solution  by  a  chemist 
Is  given  and  the  problem  solving  components  related  to  the  program 
are  presented  In  addition.  It  should  be  mentioned  that  this  problem 
has  been  solved  by  the  program  (with  facility). 

Consider  a  synthesis  Is  required  for  a  compound  whose  structural 
formula  Is  as  shown  below. 


Jru  O 
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Chemists  also  accept  a  stylized  version  of  the  same  diagram: 


o 


The  usual  representat I  on  of  chemical  structures  for  program 
manipulation  Involves  a  list  with  each  Item  representing  an  atom  and  Its 
connections  to  other  atoms  by  bonds.  We  have  designed  a  variant  of  the 
connection  list  to  suit  the  manipulations  relevant  to  synthesis;  This 
variant  will  he  referred  to  as  the  TOPOLOGICAL  STRUCTURE  DESCRIPTION  for 
a  compound.  Details  of  this  representat I  on  and  manipulation  are 
described  In  the  Thesis  (Reference  1)  and  are  not  needed  to  understand 
this  paper. 

The  chemist  examines  the  molecule  and  recognizes  several 

structural  features  such  as  the  presence  of  the  slx-membered  ring  with 

three  Internal  double  bonds  (usually  called  the  phenyl  group).  Other 

O 

it 

noticeable  features  are  the  ketone,  -C-  ,  and  olefin  bond,  -CH*CH-  . 

What  Is  defined  as  a  feature  depends  upon  the  purpose  of  the  examination 

* 

and  the  chemical  knowledge  one  possesses.  We  use  the  term  SYNTHEME 
to  refer  to  the  structural  features  of  a  molecule  that  are  relevant 
to  Its  synthesis. 

The  program  examines  the  topological  structure  description  and 
through  graphical  pattern  matching  techniques  develops  an  ATTPIRUTE 
LIST  consisting  of  a  list  of  synthemes  for  the  molecule. 
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Among  the  features  of  the  molecule,  the  phenyl  group  Is  very 
stable  and  occurs  In  many  commercially  available  compounds.  Thus, 

In  seeking  ways  to  synthesize  this  compound  the  chemist  considers 
the  ketone  and  olefin  bond  and  not  the  benzene  as  possible  reactive 
s I tes. 

The  chemist  knows  of  several  reactions  that  can  synthesize  an 
olefin  bond  and  several  that  can  synthesize  the  ketone  syntheme. 

He  can  consider  each  of  these  as  trial  last  steps  of  the  synthesis 
sequence  he  Is  seeking. 

The  program  Is  provided  with  a  collection  of  reaction  schemata 
called  the  REACTION  LIBRARY.  The  reaction  schemata  are  grouped 
Into  reaction  chapters  accordlrg  to  the  syntheme  they  synthesize. 

Each  reaction  schema  Is  provided  with  a  set  of  tests  to  be  performed 
on  the  target  molecule  and  structural  patterns  for  the  target  and 
subgoal  molecules.  The  tests  embody  many  of  the  chemical  heuristics 
that  guide  the  program.  Based  on  the  results  of  some  of  the  tests 
the  program  may  reject  the  reaction  schema.  Each  schema  has  an 
a  priori  assignment  of  merit  rating.  Based  on  the  results  of  other 
tests  the  program  may  alter  the  merit  rating  to  reflect  the  suitability 
of  the  schema  to  the  specific  target  molecule. 

We  may  represent  the  alternative  courses  of  syntheses  developed 
for  the  target  molecule  by  a  PROBLEM  SOLVING  GRAPH  (Figure  3).  The 
target  molecule  Is  a  node  at  the  top.  A  series  of  arrows  lead  from 

the  target  through  the  chapter,  attribute  and  schema  layers  to  the 
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subgoal  layer.  Each  subgoal  consists  of  one  or  more  conjoined 
compounds  --  implying  that  they  all  enter  the  reaction  to  generate  the 
target  molecule.  Thus,  the  compound  layer  Is  an  AND-layer  In  this 
AND-OR  graph. 

If  all  the  compounds  needed  for  any  one  subgoal  are  available 
commercially  we  would  consider  that  we  know  a  plausible  single-step 
synthesis  for  the  target  molecule.  Any  compound  generated  as  subgoal 
which  Is  not  commercially  available  needs  to  be  synthesized  and 
can  be  considered  In  turn  as  a  target  molscule. 

Repeating  the  above  considerations  with  the  new  target  molecule 
will  open  the  path  for  multi-step  syntheses.  The  problem  solving 
graph  branches  downward  like  a  tree  whereby  each  path  represents 
a  possible  course  of  synthesis  for  the  target  molecule. 

The  above  presentation  Is  not  to  Imply  that  a  chemist  actually 
follows  these  steps  shown  In  devising  syntheses.  The  method  of 
reasoning  analytically  from  the  target  molecule  In  a  sequence  of  steps, 
ending  up  In  available  compounds  Is  but  one  technique  In  the  vast 
repertoire  a  chemist  usually  possesses.  However,  the  analytic  search 
procedure  Is  amenable  to  convenient  computer  Implementation  and  Is 
suitable  for  Investigating  a  very  large  class  of  synthesis  problems 
The  solution  scheme  Is  described  In  the  next  section. 


■» 
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SOLUTION  SCHFMF 

The  problem  lends  Itself  to  an  analytic  search  procedure. 

The  search  begins  at  the  target  molecule  and  the  last  step  of  the 
synthesis  ts  the  first  to  he  discovered,  the  next  to  the  last  step 
Is  found  second  and  so  on.  Thus  the  discovery  sequence  Is 
the  reverse  of  the  synthesis  sequence. 

The  OOAL  Is  given  to  the  program  as  a  chemical  structure 
description.  The  description,  whether  given  as  a  canonical  compact 
linear  notation  (Wlswesser  Notation,  Reference  3)  or  as  a  topological 
structure  description,  gives  Information  about  what  atoms  are  present  In 
the  molecule  and  how  they  are  connected. 

The  structure  of  the  molecule  Is  then  examined  to  Identify  Its 
SYNTHEMFS,  such  as  the  presence  of  certain  types  of  bonds,  the 
occurrence  of  certain  groups  of  atoms  and  generally  the  substructures 
of  given  types.  Such  Information  Is  collected  automatically  Into 
an  ATTRIBUTE  LIST. 

A  large  set  of  chemical  reactions  Cover  100)  Is  compiled 
and  each  reaction  Is  schematized  to  be  usable  as  an  OPERATOP  In 
developing  the  search  space.  In  using  the  reaction  schema  as  an 
operator  the  reaction  Is  used  In  Its  Inverse  direction  (l.e.  from 
the  reaction  product  to  the  reactant)  analogous  to  the  use  of  a  rule 
of  logical  deduction  In  Its  Inverse  direction  In  a  theorem  proving 
task. 
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The  collection  of  reaction  schemata  Is  known  as  the  REACTION 
LIBRARY.  The  reaction  library  Is  arranged  as  several  CHAPTERS,  each 
containing  reaction  schemata  that  are  relevant  to  or  affect  a  syntheme 
of  target  molecule  ~ ~  the  theme  of  the  chapter. 

Each  reaction  scheme  has  detailed  TESTS  OF  RELEVANCE  and  TESTS 
OF  APPLICABILITY  toward  the  target  molecule.  The  tests  are 
performed  before  the  operator  Is  employed.  The  application  of  an 
operator  on  a  specific  attribute  of  a  molecule  results  In  one  or  more 
subgoals.  Each  subgoal  In  turn  has  one  or  more  CONJOINED  molecules 
to  he  used  together  In  the  reaction.  A  subgoal  thus  generated  Is 
further  subject  to  TESTS  OF  VALIDITY.  The  distinction  between  the 
two  sets  of  tests  Is  that  one  set  Is  conducted  on  the 

target  molecule,  whereas  the  other  set  Is  conducted  on  the  subgoals  after 
subgoal  generation. 

The  successive  application  of  operators  on  the  subgoal  compounds 
and  all  their  subgoals  generates  the  SEARCh  SPACE.  The  strongest 
condition  for  termination  of  path  development  Is  the  availability  of  the 
compounds  needed.  The  availability  Is  checked  using  a  compound  catalog 
of  a  chemical  manufacturing  company,  a  list  of  about  4000  compounds. 

Figures  2  and  3  describe  the  schematic  flowchart  of  the  algorithm 

and  the  five  layers  of  the  PROBLEM  SOLVING  TREE  generated  In  developing 
subgoals  one  level. 
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SAMPLE  PROBLEM  AND  EFFORT  SPENT 


It  Is  a  matter  of  considerable  difficulty  to  estimate 
the  size  of  search  space  either  in  general  or  for  a  specific 
example.  An  attempt  Is  made  here  however,  to  arrive  at  a  figure  for  th 
search  space  of  the  compound  VITAMIN  A.  This  compound  bears  a 
complex  structure  (Figure  4)  and  has  held  the  attention  of  synthesis 
chemists  for  more  than  a  decade  of  research  effort. 


Figure  4.  Structure  of  VITAMIN  A 


There  are  two  synthemes  of  the  molecule  for  which  the  program 
finds  reaction  chapters.  There  are  five  Instances  of  the  syntheme 
nOUBLEBONO  and  one  Instance  of  the  syntheme  ALCOHOL.  Thus  there 
are  six  attribute  nodes  In  the  first  level  of  subgoal  generation 
(Refer  Figure  5).  The  reaction  chapters  have  five  and  four  reaction 
schemata  In  the  respective  chapters.  One  schema  Is  Invalid  according 
to  the  tests  and  one  schema  falls  In  matching  the  goal  pattern  specified 
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In  t  !if*  transfornat  Ion,  with  the  structure  of  the  molecule.  After 
f  validating  and  pruning  out  duplicates,  43  subgoals  are  entered  In  the 

problem  solving  tree  to  conclude  the  first  level  of  subgoal 
generation.  None  of  these  suhgoals  completes  a  r!'nthesls  for 
f  Vitamin  A.  Some  of  the  subgoals  are  of  single  molecules  while  others 

are  of  two.  There  are  52  distinct  compounds  In  the  subgoals 
and  only  three  of  these  are  found  readily  available  through  the 
compound  catalog. 

The  program  developed  the  space  to  a  maximum  depth  of  nine 
»  suhgoal  levels,  or  (9  times  5  plus  1  *)  46  layers  of  the  problem 

solving  tree.  (f  the  potential  problem  solving  tree  were  considered 
to  be  branching  uniformly  at  all  levels.  It  would  represent  a 
potential  search  space  of  (5f))**9  or  approximately  (10)**12 
subgoals.  However,  the  growth  of  the  problem  solving  tree  can  be 
attenuated  strongly  hy  a  variety  of  factors  such  as  the  duplication 
of  subgoal  compounds,  the  completion  of  syntheses  or  the  reduction 
of  the  number  of  applicable  operators  at  deeper  levels  of  the  tree. 
Allowing  such  attenuation  the  search  space  might  then  be  of  the 
order  of  (10)**9  subgoals.  This  estimate  Is  conservative. 

The  program  explored  the  search  space  for  a  time  duration  of 
SIX  MINUTES  (*)  and  examined  about  120  SUROOALS .  These  subgoals 
Include  only  those  generated  from  applicable  schema,  validated  and 
retained  for  further  perusal.  Of  these,  over  28  subgoals  were 
expanded  and  had  subtrees  developed  for  them.  At  least  6  DIFFERENT 
COMPLETED  SYNTHESES  were  extracted  from  the  search  tree,  and  many 
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more  were  Interesting  and  near  completion.  The  problem  solving  tree 
actually  developed  by  the  program  Is  summarized  In  figure  6. 


(*)  Program  written  mainly  In  Pl./ONE  running  on  IRM  360/6? 
under  Ratch  mode. 


Synthesis-Rearch  tree  (schematic)  for  Vitamin  A.  Filled-in  circles 
represent  reactants  of  subgoale  selected  for  further  development.  Order 
of  development  is  indicated  by  the  circled  numerals.  Compound  nodes 
connected  by  a  horizontal  line  segment  (as  in  subgoal  3)  sure  both 
required  for  a  given  reaction.  Ail  generated  subgoale  on  the  tree  that 
were  not  selected  for  exploration  are  represented  by  a  horizontal  bar, 
with  the  number  of  subgoals  in  the  unexplored  group  indicated  under  the 
bar.  Subgoale  that  were  selected  for  exploration  that  have  no  progeny 
on  the  tree  (as  in  subgoal  8)  failed  to  generate  any  subgoale  that  could 
pass  the  heuristic  tests  for  admission  to  the  search-tree. 


DESIGN  OF  SFARCH  STRATEGY 


The  Importance  of  guiding  the  search  properly  through  the 
search  space  cannot  he  overemphasized.  Many  a  designer  of 
At  programs  has  wrestled  with  the  question  of  what  Is  the  'best' 
strategy  for  guiding  heuristic  search,  taking  Into  account  the 
cha racte r I s t I cs  of  the  space  and  the  requl rements  on  the  solution. 

The  strategies  considered  vary  In  their  choice  of  primitives 
and  their  sources  of  Information, 

The  programmed  determination  of  a  search  strategy  --  an  aspect 
of  what  may  he  termed  the  PARADIGM  ISSUE  IN  ARTIFICIAL  INTELLIGENCE  -- 
Is  worthy  of  attention.  Although  we  do  not  have  a  program  to  generate 
Its  own  strategy  as  yet,  we  do  have  a  program  that  selects  a  strategy 
suitable  for  the  situation  from  among  prespecified  al ternat Ives. 

The  following  strategies  can  either  be  observed  as  program's 
behaviour  or  can  be  considered  useful  for  Incorporation. 


20 


FIXED  STRATEGY  IN  CHEMICAL  SYNTHESIS 

Fixed  strategies  are  useful  when  one  needs  to  be  systematic  In 
generation.  The  depth-first  and  one  level  breadth-firth  strategies  are 
well  known  and  are  quite  unsuitable  for  developing  syntheses. 

However/  under  most  schemes  of  evaluat’on  and  subgoal  selection 
there  are  situations  when  several  contenders  tie  to  the  highest  value. 

A  fixed  strategy  Is  usually  pursued  In  those  Instances.  The  synthesis 
program  will  select  the  latest  subgoal  first  among  those  whose 
priority  Is  not  resolved  otherwise. 

Most  organic  compounds  of  'small1  size  are  either  available  or 
can  be  easily  synthesized.  When  the  program  encounters  small 
compounds  that  are  readily  available/  search  Is  terminated  along  that 
path  after  assigning  a  compound  merit  determined  by  the  catalog 
entries  like  the  cost  of  the  substance.  Search  Is  terminated  for 
small  compounds  even  when  not  readily  available/  with  the  computation 
of  the  estimated  difficulty  of  Its  synthesis. 

PARTIAL  PATH  EVALUATION  IN  CHEMICAL  SYNTHESIS 

The  predominant  strategy  that  the  program  uses  Is  to  evaluate 
every  path  In  the  search  tree  leading  down  from  the  prime  target 
molecule  and  to  choose  one  that  gets  the  highest  value.  The  compounds 
thav  terminate  the  branched  path  and  the  reactions  used  In  every  step 
enter  Into  computing  the  value  for  each  path.  The  program  has  rules 
on  computing  compound  merits,  combining  merits  of  conjoined  comnounds 
to  get  subgoal  merits  and  combining  thoso  with  reaction  merits  to 
obtain  values  that  can  be  backed  up  the  tree. 
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Conjoined  subgoal  compounds  A  and  B 


A  B 


Backup  Merit  fcr  C 

*  f(  Merit  of  0 ,  Reaction  Merit  D  -->  C  ) 

Backup  Merit  for  R 

■  f(  Merit  of  C,  Reaction  Merit  C  -->  B  ) 

Backup  Mer l t  for  A 

-  f(  Merit  of  E,  Merit  of  F 

Reaction  Merit  of  E  ♦  F  — >  A  ) 

Reaction  Merit  of  E  ♦  F  -->  A  ) 

Backup  Merit  for  Subgoal  AB  =  g(  Merit  of  A,  Merit  of  B  ) 

Presently,  the  functions  f  and  g  simply  multiply  their  arguments 
and  return  the  product  normalized  to  the  scale  0-10.  The  definitions 
are  presently  adequate  but  can  be  changed  easily. 

The  selection  of  subgoal  proceeds  from  the  top  of  the  tree 
downward,  selecting  the  subgoal  with  the  highest  merit  at  every  level. 
However,  conjoined  compounds  represent  AND-noder  in  this  AND-OR  tree. 
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and  so  the  compound  with  the  least  me, It  Is  chosen  from  among 

conjuncts.  This  Is  In  accordance  with  the  general  strategy  of 
dealing  with  AND«0R  problem  solving  graphs. 

The  eval uat Ion,  backup  procedure  and  goal  selection  are  described 
In  fuller  details  In  the  thesis  (  reference  1  ). 

COMPLEX ITY/SIMPLICITY  OF  SUBGOAL  COMPOUNDS 

At  every  stage  of  evaluation  and  search  continuation,  the  terminal 
nodes  of  the  search  tree  are  compounds.  A  Graph'  Tra'*erser-1  Ike 
strategy  will  evaluate  the  terminal  nodes  and  continue  search  with 
one  of  highest  merit.  In  designing  syntheses,  the  Intervening  reactions 
are  as  Imoortant  as  the  subgoal  compounds.  Thus  this  strategy  In 
Itself  Is  unsuitable.  Rut  again,  among  partial  paths  that  get  equal 
evaluation.  It  Is  reasonable  to  choose  those  that  are  terminated 
by  suhgoal s  of  higher  merit.  (If  the  subgoal  Is  of  higher  merit 
this  would  Imply  that  the  reactions  are  poorer  on  that  path;  thus 
one  may  actually  prefer  terminating  subgoals  with  the  lowest  merit 
depending  upon  solution  requirements.  ) 

SIZE  OF  SEARCH  SPACE 

It  Is  also  reasonable  to  use  an  estimated  size  of  search 
that  may  ensue  on  different  paths.  In  order  to  continue  search.  It 
Is  especially  useful  when  such  program  resources  as  time  or  storage 
are  dwindling  or  when  the  evaluation  leaves  a  LARGE  NUMBFR  of 
subgoals  of  equal  priority. 
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APPL  f  CAT  I  ON  OF  KEY  TRANSFORMS  IN  CHEMICAL  SYNTHESIS 


The  democratic  tenet  "All  reactions  are  created  equal"  has  to  he 
cast  aside.  In  ordsr  to  allow  preferential  treatment  for  key 
transformations.  The  present  reaction  library  contains  a  priori  merit 
ratings  of  reaction  schemata.  The  merit  of  each  schema  Is  further 
adjusted  when  used,  to  correspond  to  the  specific  application  of  the 
transformation.  This  technique  allows  preferred  pursuit  of  paths  having 
the  key  transforms. 

This  a  priori  preference  system  can  be  overridden  by  the  program 
under  special  situations.  An  example  Is  the  technique  known  to  chemists 
as  BLOCKING  or  PROTECTION.  Blocking  of  certain  structural  features 
of  molecules  Is  a  very  useful  synthesis  technique  facilitating 
solutions  to  many  problems.  Sometimes  a  synthesis  without  blocking 
may  not  he  possible.  With  reference  to  Figure  7,  the  reasoning  may 
proceed  as  follows. 
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Subgoal  compound  with  attributes  Fa  and  Fb 


'Ta  \Tb 


Simpler  subgoal 
but  the  reaction 

I s  J  udged  I nva 1  I d 


Subgoal  where  Fb  gets  BLOCKED 


Projected  subgoal  (s Impl e, val I d) 


Figure  7.  APPLICATION  OF  KF.Y  TRANSFORM  -  BLOCKING 


The  transformation  Ta  I s  a  preferred  transformation  but  It  Is 
made  Inapplicable  as  functional  group  Fb  Is  very  sensitive  to  the 
reaction,  making  It  Invalid.  The  transformation  Tb  which  does  not 
have  a  priori  high  merit,  however,  removes  Fb  or  changes  It  to  Fb'; 
and  Fb'  Is  not  sensitive  to  Ta.  Thus  subgoal  resulting  from  Ta  can 
be  terminated.  The  subgoal  from  Tb  Is  realized  to  have  higher  merit 
In  this  context,  because  It  can  now  be  subject  to  Ta  to  yield  a  simpler 
valid  subgoal.  Such  a  sophisticated  attention  refocussing  scheme 
using  contextual  evaluation  produces  excellent  results,  by  overruling 
the  standard  evaluation  and  forcing  development  along  lines  that  are 
Intuitive  to  tht:  consulting  chemist. 


SELECTION  AND  ORDERING  OF  ATTRIBUTES 

Some  attributes  of  molecules  prove  to  be  more  sensitive  than 
others  toward  all  or  most  transformations.  Thus,  while  selecting 
attributes  one  may  Impose  an  order  of  preference  or  one  may  exclude 
certain  attributes,  saving  the  effort  to  be  spent  on  whole  chapters 
of  the  reaction  library.  The  a  priori  ordering  of  attributes  with 
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du#»  consideration  to  reactivities  Is  another  piece  of  chemical 
knowledge  thus  available. 

Further#  a  contextual  reordering  Is  possible  here.  Vitamin  A 
for  example#  has  four  Instances  of  the  attribute  OLEFIN  BOND. 

One  the  operators  results  In  a  smaller  hut  similar  compound  with 
only  three  OLEFIN  BONDs  and  the  reaction  Itself  has  high  merit. 

When  continuing  search  with  this  new  suhgoal  a  clear  Indication  now 
comes  from  the  above  observation#  to  prefer  to  operate  on  another 
OLEFIN  BONO.  The  similarity  of  the  resulting  compound  also  raises 
the  expectation  that  successive  application  of  the  same  trani format l 
may  solve  the  problem  at  han>!, 

KEY  INTERMEDIATE  COMPOUNDS  IN  CHEMICAL  SYNTHESIS  (suggested) 

Some  compounds  can  he  changed  qulcklv  Into  a  variety  of  similar 
but  different  compounds  and  are  often  used  as  key  'ntermedlate 
compounds  In  synthesis.  When  a  subgoal  compound  Is  similar  to  a 
readily  available  key  Intermediate#  synthesis  search  may  profitably 
he  geared  toward  the  specific  Intermediate.  On  the  other  hand# 
when  a  key  Intermediate  subgoal  Is  generated  that  Is  not  available 
a  synthesis  for  that  Intermediate  subgoal  Is  to  be  actively  pursued 
with  high  priority. 

USE  OF  ANALOGY  IN  CHEMICAL  SYNTHESIS  (suggested) 

Quite  often  chemists  arrive  at  syntheses  by  following  the  known 
synthesis  of  an  analogous  compound.  Situations  where  solution 
(or  simplification)  by  analogy  can  be  applied  arise  profusely: 


the  goal  compound  Is  analogous  to  a  compound  whose  synthesis  Is 
puhllshed/  a  key  Intermediate  can  be  synthesized  by  analogy  to 
an  available  key  Intermedl ate,  a  subgoal  generated  Is  similar  to  one 
or  more  Intermediate  compounds  generated  and  solved  by  the  program 
during  this  run  alone.  However  the  advantages  of  overruling  normal 
search  by  reasoning  through  analogy  In  these  situations  Is  not  clear. 

It  Is  needless  to  emphasize  that  the  synthesis  of  an  Intermediate 
compound  solved  at  one  Instance  In  the  problem  solving  tree  Is  available 
throughout  the  course  of  the  program  run  and  Is  reused  by  direct 
reference . 

EXTERNAL  CONDITIONS  GUIDING  THE  SEARCH 

There  Is  need  for  tempering  the  selection  of  syntheses  with 
such  considerations  as  the  tonicity  of  the  substances  to  be 
manipulated,  special  apparatus  needed  to  contain  and  react  gases 
and  cost  associated  with  expensive  commercial  compounds,  reagents  or 
catalysts.  However  the  problem  at  present  Is  seen  as  being  one  of 
filtering  out  syntheses  not  desired  from  the  output  of  the  program, 
this  allows  a  fuller  set  of  prejudices  and  personal  preferences  of 
chemists  to  be  Imposed  upon  the  choice  of  syntheses. 

We  have  consciously  avoided  developing  an  Interactive  system 
where  a  chemist  supplies  guidance  on-line  to  the  program.  Our 
Interest  In  the  problem  Is  mainly  as  an  Al  endeavour  and  to  that 
extent  our  attention  was  given  to  designing  a  good  blend  of  search 
strategies  as  outlined  above  that  could  effectively  substitute  for  the 
chemists’  guidance. 
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REMARKS 


The  strategies  discussed  above  fall  roughly  Into  subgoal -dependence 
transform-dependence  and  part  1  a  1 -path-dependence.  The  criteria  to 
be  u*.ed  In  each  strategy  (the  limits,  thresholds,  orderings  and 
merit  boosts)  can  have  several  sources  of  Information  (Figure  8). 


MODEL  OF  PROBLEM  OR 
OF  SOLUTION  SPACE 

CUMULATED  PAST  EXPERIENCE 


TEMPORARY  SFTTINGS  DERIVED 
FROM  KNOWLEDGE  OF 
CURRENT  SESSION 


Figure  8.  SOURCES  OF  INFORMATION  AND  STRATEGIES 


Firstly,  quite  often  the  criteria  derived  from  models  (Implicit  or 
explicit)  are  In  the  form  of  absolute  limits  or  fixed  orderings,  reflectl 
the  static  nature  of  the  model  one  has  In  mind.  In  "tuning"  these 
criteria,  one  Is  readjusting  the  model  of  the  problem  or  solution  space. 
Secondly,  In  certain  cases,  the  program  can  be  delegated  the  task  of 
keeping  Itself  tuned  with  respect  to  certain  criteria,  using  cumulated 
past  experience,  giving  rise  to  an  adaptive  (and  may  be  learning) 
characteristic.  Thirdly,  the  contextual  evaluations  explained  In  the 
last  section  Illustrate  how  the  program  can,  using  knowledge  acquired 
from  the  current  session,  temporarily  overrule  a  model  prescribed  to  aid 
It  In  finding  better  solutions  faster,  without  leading  to  adaptation  or 
adjustment  of  the  model. 
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